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Abstract 

The Dynamic Time Warping (DTW) is a popular similarity measure between time 
series. The DTW fails to satisfy the triangle inequality and its computation requires 
quadratic time. Hence, to find closest neighbors quickly, we use bounding tech- 
niques. We can avoid most DTW computations with an inexpensive lower bound 
(LBJKeogh). We compare LB_Kcogh with a tighter lower bound (LB Jmproved). 
We find that LB Jniprovcd-based search is faster. As an example, our approach is 
2-3 times faster over random-walk and shape time series. 
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1 Introduction 

Dynamic Time Warping (DTW) was initially introduced to recognize spo- 
ken words [1], but it has since been applied to a wide range of informa- 
tion retrieval and database problems: handwriting recognition [2,3], signature 
recognition [4,5], image de-interlacing [6], appearance matching for security 
purposes [7], whale vocalization classification [8], query by humming [9,10], 
classification of motor activities [11], face localization [12], chromosome classi- 
fication [13], shape retrieval [14,15], and so on. Unhke the Euclidean distance, 
DTW optimally aligns or "warps" the data points of two time series (see 
Fig. 1). 

When the distance between two time series forms a metric, such as the Eu- 
clidean distance or the Hamming distance, several indexing or search tech- 
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niques have been proposed [16-20]. However, even assuming that we have a 
metric, Weber et al. have shown that the performance of any indexing scheme 
degrades to that of a sequential scan, when there are more than a few dimen- 
sions [21]. Otherwise — when the distance is not a metric or that the number 
of dimensions is too large — we use bounding techniques such as the Generic 
multimedia object indexing (GEMINI) [22]. We quickly discard (most) false 
positives by computing a lower bound. 




Fig. 1. Dynamic Time Warping example 



Ratanamahatana and Keogh [23] argue that their lower bound (LB_Keogh) 
cannot be improved upon. To make their point, they report that LB_Keogh 
allows them to prune out over 90% of all DTW computations on several data 
sets. 

We are able to improve upon LB_Keogh as follows. The first step of our two- 
pass approach is LB_Keogh itself. If this first lower bound is sufficient to dis- 
card the candidate, then the computation terminates and the next candidate 
is considered. Otherwise, we process the time series a second time to increase 
the lower bound (see Fig. 5). If this second lower bound is large enough, the 
candidate is pruned, otherwise we compute the full DTW. We show experi- 
mentally that the two-pass approach can be several times faster. 

The paper is organized as follows. In Section 4, we define the DTW in a generic 
manner as the minimization of the Ip norm (DTWp). Among other things, we 
show that if x and y are separated by a constant {x > c > y oi x < c < y) then 
the DTWi is the h norm (see Proposition 1). In Section 5, we compute generic 
lower bounds on the DTW and their approximation errors using warping en- 
velopes. In Section 6, we show how to compute the warping envelopes quickly. 
The next two sections introduce LB_Keogh and LBJmproved respectively. 
Section 9 presents the application of these lower bounds for multidimensional 
indexing whereas the last section presents an experimental comparison. 
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2 Conventions 



Time series are arrays of values measured at certain times. For simplicity, 
we assume a regular sampling rate so that time series are generic arrays of 
floating-point values. Time series have length n and are indexed from 1 to 
n. The Ip norm of x is = (Y,i for any integer < p < oo and 

1 1 *^ 1 1 oo — IHQiX^ I 'i \. The Ip distance between x and y is \\x — y\\p and it satisfies 
the triangle inequality \\x — z\\p < \\x — y\\p + \\y — z\\p for 1 < p < oo. The 
distance between a point x and a set or region S is d{x, S) = min^^gs d{x, y). 
Other conventions are summarized in Table 1. 

Table 1 

Frequently used conventions 



n length of a time series 

||x||p Ip norm 

DTWp monotonic DTW 

NDTWp non-monotonic DTW 

w DTW locality constraint 

U{x),L{x) warping envelope (see Section 5) 

H{x, y) projection of x on y (see Equation 1) 



3 Related Works 



Beside DTW, several similarity metrics have been proposed including the di- 
rected and general Hausdorff distance, Pearson's correlation, nonlinear elastic 
matching distance [24], Edit distance with Real Penalty (ERP) [25], Needleman- 
Wunsch similarity [26], Smith- Waterman similarity [27], and SimilB [28]. 

Boundary-based lower-bound functions sometimes outperform LB_Keogh [29]. 
We can also quantize [30] the time series. 

Sakurai et al. [31] have shown that retrieval under the DTW can be faster by 
mixing progressively finer resolution and by applying early abandoning [32] to 
the dynamic programming computation. 
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4 Dynamic Time Warping 



A many-to-many matching between the data points in time series x and the 
data point in time series y matches every data point Xj in x with at least one 
data point yj in y, and every data point in y with at least a data point in x. 
The set of matches j) forms a warping path V. We define the DTW as the 
minimization of the Ip norm of the differences {xi — yj}{i,j}er over all warping 
paths. A warping path is minimal if there is no subset F' of F forming an 
warping path: for simplicity we require all warping paths to be minimal. 

In computing the DTW distance, we commonly require the warping to remain 
local. For time series x and y, we align values Xi and yj only if |i — j| < w 
for some locahty constraint w > [1]. When w — 0, the DTW becomes the Ip 
distance whereas when w > n, the DTW has no locality constraint. The value 
of the DTW diminishes monotonically as w increases. (We do not consider 
other forms of locality constraints such as the Itakura parallelogram [33].) 

Other than locahty, DTW can be monotonic: if we align value Xi with value 
yj, then we cannot align value Xi+i with a value appearing before yj {yj/ for 
f <j)- 

We note the DTW distance between x and y using the Ip norm as DTWp(a;, y) 
when it is monotonic and as NDTWp(x, y) when monotonicity is not required. 

By dynamic programming, the monotonic DTW requires 0{wn) time. A typi- 
cal value oiw is n/10 [23] so that the DTW is in 0{n^). To compute the DTW, 
we use the following recursive formula. Given an array x, we write the suffix 
starting at position i, = Xi, Xi+i, . . . , x„. The symbol © is the exclusive or. 
Write Qij — DTWp(a;(j), so that DTWp(a;,y) = then 

if = = 

if \x(i) \ = © \y^j) \ = 



oo 



min(gj+ij, Qij+i, qi+ij+i] 



or \^ — j\ > w 
otherwise. 



For p = oo, we rewrite the preceding recursive formula with qij = DTWoo{x(i),y(j)), 
and Qij = max{\xi - yj\,mm{qi+ij,qij+i,qi+ij+i)) when 7^ 0, \y(j)\ 7^ 0, 
and |i — j| < w. 

We can compute NDTWi without locality constraint in O(nlogn) [34]: if the 
values of the time series are already sorted, the computation is in 0{n) time. 

We can express the solution of the DTW problem as an alignment of the two 
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initial time series (such as a; = 0, 1, 1, and y = 0,1, 0, 0) where some of the 
values are repeated (such as = 0, 1, 1, 0, and y' = 0, 1, 1, 0, 0). If we allow 
non-monotonicity (NDTW), then values can also be inverted. 

The non-monotonic DTW is no larger than the monotonic DTW which is itself 
no larger than the Ip norm: NDTMVp{x,y) < DTMVp{x,y) < \\x — y\\p for all 
< p < oo. 

The DTWi has the property that if the time series are value-separated, then 
the DTW is the li norm as the next proposition shows. In Figs. 3 and 4, we 
present value-separated functions: their DTWi is the area between the curves. 

Proposition 1 If x and y are such that either x>c>yorx<c<y for 
some constant c, then DTWi{x,y) — NDTW\{x,y) — ||x — y\\i. 



PROOF. Assume x > c> y. Consider the two aligned (and extended) time 
series x',y' such that NDTWi(a;,y) = — y'\\i. We have that x' > c > y' 
and NDTWi(a;,|/) = \\x' - y'\\, = Ei Wi - y'i\ = Ei - c\ + \c - yi\ = 
11^^' ~ c||i -|- ||c — y'lli > \\x — c\\i + \\c — y\\i — \\x — y\\i. Since we also have 
NDTWi(x,y) < DTWi(x,y) < \\x - y\\i, the equality follows. 



Proposition 1 does not hold for p > 1: DTW2((0, 0, 1, 0), (2, 3, 2, 2)) = y/l7 
whereas ||(0, 0,1,0) - (2, 3, 2, 2) Ha = v^. 



5 Computing Lower Bounds on the DTW 

Given a time series x, define U{x)i = maxk{xk\ \k — i\ < w} and L{x)i = 
mink{xk\ \k — i\ < w} for i = 1, . . . ,n. The pair U{x) and L[x) forms the 
warping envelope of x (see Fig. 2). We leave the locality constraint w implicit. 

The theorem of this section has an elementary proof requiring only the fol- 
lowing technical lemma. 

Lemma 1 Ifbe [a, c] then (c - 0)^* > (c - b)P + {b - for l<p< 00. 



PROOF. For p = 1, {c — by + {b — ay = {c — ay. For p > 1, by deriving 
(c — by + (6 — ay with respect to b, we can show that it is minimized when 
b = (c + a)/2 and maximized when b G {a, c}. The maximal value is (c — ay. 
Hence the result. 
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10 20 30 40 50 60 70 80 90 100 

Fig. 2. Warping envelope example 

The following theorem introduces a generic result that we use to derive two 
lower bounds for the DTW including the original Keogh-Ratanamahatana 
result [35]. Indeed, this new result not only implies the lower bound LB_Keogh, 
but it also provides a lower bound to the error made by LB_Keogh, thus 
allowing a tighter lower bound (LB .Improved). 

Theorem 1 Given two equal-length time series x and y and 1 < p < oo, then 
for any time series h satisfying Xi > hi > U{y)i or Xi < hi < L{y)i or hi = Xi 
for all indexes i, we have 

DTWp{x,yy > NDTWp{x,yf 

> \\x - h\\l + NDTWp{h,yY. 

Forp = oo, a similar result is true: DTWoo{x,y) > NDTWoo{x,y) > max(||x— 
h\UNDTW^{h,y)). 

PROOF. Suppose that 1 < p < oo. Let F be a warping path such that 
'NDTWp{x,yyP = I](jj)gr " Vjlp- By the constraint on h and Lemma 1, 
we have that \xi — yj\^ > \xi — hi\P + \hi — yjl^ for any {i,j) G F since hi G 
[mm{xi,yj),ma.x{xi,yj)]. Hence, we have that NDTWp(x, y)^ > J2{i,j)£r l^i ~ 
hi\^ + \hi — yj\P > \\x — h\\^ + I](ij)er \hi — yj\^- This proves the result since 
J2{i,j)er \hi — Vjl > NDTWp(/i, y). For p = oo, we have that 

NDTWoo(a;,i/) = max \xi - yj\ 

(«J)er 

< max max(|xi — hi\,\hi — yj\) 
(«j)er 

= max(||x -h\U NDTWoo(/i, y)), 

concluding the proof. 
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While Theorem 1 defines a lower bound — /i||p), the next proposition shows 
that this lower bound must be a tight approximation as long as /i is close to 
y in the Ip norm. 

Proposition 2 Given two equal-length time series x and y, and \ < p < oo 
with h as in Theorem 1, we have that \\x — h\\p approximates both DTWp{x,y) 
and NDTWp{x,y) within \\h — y\\p. 



PROOF. By the triangle inequality over /p, we have /i||p+||/;,— > 
y\\p. Since — |/||p > DTWp(a;, y), we have — — i/||p > DTWp(a:, y), 
and hence \\h — y\\p > DTWp{x, y) — \\x — h\\p. This proves the result since by 
Theorem 1, we have that DTWp(a;,|/) > NDTWp(a;,|/) > \\x - h\\p. 



This bound on the approximation error is reasonably tight. If x and y are 
separated by a constant, then DTWi(a;,|/) = \\x — y\\i by Proposition 1 and 
Ik - = \xi - Z/il = Ei - + \hi -yi\^\\x- h\\i + \\h - y||i. Hence, 
the approximation error is exactly — y||i in such instances. 



6 Warping Envelopes 



The computation of the warping envelope U{x),L{x) requires 0{nw) time 
using the naive approach of repeatedly computing the maximum and the min- 
imum over windows. Instead, we compute the envelope with at most 3n com- 
parisons between data-point values [36] using Algorithm 1. 



7 LB Keogh 



Let H{x, y) be the projection of x on y defined as 



H{x,y)i = < 



iU{y), ifx,>U{y), 
L{y)i if Xi < L{y)i 
otherwise. 



X 



for i = 1, 2, . . . , n. We have that H{x, y) is in the envelope of y. By Theorem 1 
and setting h = H{x,y), we have that NDTWp{x,y)P > \\x - H{x,y)\\P + 
NDTW p{H{x,y),y)P for 1 < p < oo. Write LB_Keogh (x, = \\x-H{x, 



(see Fig. 3), then LB_Keoghp(x, y) is a lower bound to NDTWp(a;, y) and thus 
DTWp{x,y). The following corollary follows from Theorem 1 and Proposi- 
tion 2. 
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Algorithm 1 Streaming algoritlim to compute tlie warping envelope using 
no more than Sn comparisons 

input a time series y indexed from 1 to n 

input some DTW locality constraint w 

return warping envelope ?7, L (two time series of length n) 

u, I ^ empty double-ended queues, we append to "back" 

append 1 to n and I 

for i in {2, . . . , n} do 
if i > w + 1 then 

Ui—ui ■* yfront(M)5 L/i—w ^ ?/front(/) 

if Hi > Vi-i then 
pop u from back 
while yi > yback(«) do 
pop u from back 

else 

pop I from back 
while yi < yback{0 do 
pop / from back 
append i to u and I 
if i = 2w + 1 + front (u) then 

pop u from front 
else if i = 2w + 1 + front(/) then 
pop I from front 
for i m {n + 1, . . . ,n + w} do 

Ui-w < yfront(n)) Li-u) ■* yfront(Z) 

if i-front(n)> 2w + 1 then 

pop u from front 
if i-front(/)> 2w + 1 then 

pop I from front 
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CoroUciry 1 Given two equal-length time series x and y and 1 < p < oo, 

then 

• LB_Keoghp{x,y) is a lower bound to the DTW: 

DTWp{x,y) > NDTWp{x,y) > LB_Keoghp{x,y)] 

• the accuracy of LB^Keogh is bounded by the distance to the envelope: 

DTWp{x,y) - LB_Keoghp{x,y) < \\max{U{y)i - yi,yi - L{y)i}i\\p 
for all X. 

Algorithm 2 shows how LB_Kcogh can be used to find a nearest neighbor 
in a time series database. We used DTWi for all implementations (see Ap- 
pendix C). The computation of the envelope of the query time series is done 
once (see hue 4). The lower bound is computed in lines 7 to 12. If the lower 
bound is sTifficicntly large, the DTW is not computed (see line 13). Ignoring 
the computation of the full DTW, at most (2A^ + 3)n comparisons between 
data points are required to process a database containing N time series. 

Algorithm 2 LB_Keogh-based Nearest- Neighbor algorithm 

1: input a time series y indexed from 1 to n 

2: input a set S of candidate time series 

3: return the nearest neighbor S to y in S under DTWi 

4: U,L<^ envelope(y) 

5: b ^ oo {b stores min2;g5 DTWi(x, y)} 

6: for candidate x 'm S do 

7: P {(3 stores the lower bound} 

8: for z G {l,2,...,n} do 

9: if Xi > Ui then 
10: I3^(3 + Xi-Ui 

11: else if Xi < Lj then 
12: I3^p + Li-Xi 

13: if /? < 6 then 

14: t ^ DTWi (a, c) {We compute the fuh DTW.} 
15: if t < 6 then 
16: h^t 
17: B^c 



8 LBJmproved 

In the previous Section, we saw that NDTWp(a;, |/)^ > LB_Keoghp(a;, y)^ + 
NDTWp(i/(a;,2/),2/)P for 1 < p < oo. In turn, we have NDTWp(if(a;, ?/), |/) > 
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Fig. 4. LBJmproved example: the area of the marked region is LB Jmprovedi(a;, y) 

LB_Keoghp(?/, y)). Hence, write 

LBJmprovedp(x, yY = LB_Keoghp(x, yY + LB_Keoghp(|/, H{x, y)Y 

for 1 < p < oo. By definition, we have LB Jmprovedp(x, y) > LB_Keoghp(x, y). 
Intuitively, whereas LB_Keoghp(x, y) measures the distance between x and the 
envelope of y, LB_Keoghp(|/, H{x, y)) measures the distance between y and the 
envelope of the projection oi x ony (see Fig. 4). The next corollary shows that 
LBJmproved is a lower bound to the DTW. 

Corollary 2 Given two equal-length time series x and y and 1 <p < oo, then 
LB_Improvedp{x, y) is a lower hound to the DTW: DTWp{x, y) > NDTWp{x, y) > 
LB_Improvedp{x, y) . 



PROOF. Recall that LB_Keoghp(x, = \\x - H{x,y)\\p. First apply Theo- 
rem 1: DTWp(x,?/)P > NDTWp(x,?/)P > LB_Keoghp(a;, ?/)P+NDTWp(i/(x, y), y)^. 
Apply Theorem 1 once more: NDTWp(?/, y)Y > LB_Keoghp(?/, H{x, y)Y- 
By substitution, we get DTWp(x,y)P > NDTWp(x,y)P > LB_Keoghp(x, y)^ + 
LB_Keoghp(?/, y))'' thus proving the result. 



Algorithm 3 shows how to apply LBJmproved as a two-step process (see 
Fig. 5). Initially, for each candidate x, we compute the lower bound LB_Keogh]^(x, y) 
(see lines 8 to 15). If this lower bound is sufficiently large, the candidate is dis- 
carded (see line 16), otherwise we add LB_Keogh^(?/, H{x, y)) to LB_Keogh]^(a:, ?/), 
in effect computing LB Jmproved]^(a:, y) (see lines 17 to 22). If this larger lower 
bound is sufficiently large, the candidate is finally discarded (see line 23). Oth- 
erwise, we compute the full DTW. If a is the fraction of candidates pruned by 
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LB_Keogh, at most (2A^ + 3)n + 5(1 — a)Nn comparisons between data points 
are required to process a database containing N time series. 



Algorithm 3 LB Jmproved-based Nearest- Neighbor algorithm 
1: input a time series y indexed from 1 to n 
2: input a set S of candidate time series 
3: return the nearest neighbor S to y in 6" under DTWi 
4: U,L ^ envelope(y) 
5: 6 <— cxD {6 stores min^^g^ DTWi(x, y)} 
6: for candidate x in 5 do 

7: copy x to x' {x' will store the projection of x on y} 

8: /3 {/3 stores the lower bound} 

9: for i G {l,2,...,n} do 
10: if Xi > Ui then 
11: (i^l3 + Xi-Ui 

12: x[ = Ui 

13: else if xi < Li then 
14: I3<- f5 + Li-Xi 

15: x[ = Li 

16: if P<h then 
17: [/', L' <— envelopc(a;') 

18: for i G {l,2,...,n} do 
19: if Vi > U[ then 

20: P^P + y^-Ul 

21: else if yi < L'^ then 

22: p^p + L'i-yi 

23: if /? < 6 then 

24: t ^ DTWi(o, c) {We compute the fuh DTW.} 

25: if i < 6 then 

26: b^t 
27: 5 ^ c 



9 Using a multidimensional indexing structure 

The running time of Algorithms 2 and 3 may be improved if we use a multi- 
dimensional index such as an R*-trcc [37]. Unfortunately, the performance of 
such an index diminishes quickly as the number of dimensions increases [21]. 
To solve this problem, several dimensionality reduction techniques are possible 
such as piecewise linear [38-40] segmentation. Following Zhu and Shasha [10], 
we project time series and their envelopes on a d-dimensional space using 
piecewise sums: Pd{x) = {J2ieCj where Ci,C2, ■ ■ ■ ,Cd is a disjoint cover 
of {l,2,...,n}. Unlike Zhu and Shasha, we do not require the intervals to 
have equal length. The li distance between Pd{y) and the minimum bound- 
ing hyperrectangle containing Pd{L{x)) and Pd{U{x)) is a lower bound to the 
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(a) We begin with y and its envelope 
L{y),U{y). 
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(b) We compare candidate x with the 
envelope L{y),U{y). 
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(c) The difference is LB_Keogh(a;,y). 
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(d) We compute x', the projection of 
X on the envelope L{y), U{y). 



LB Improved - LB Keogh I 
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(e) We compute the envelope of x'. (f) The difference between y and 

the envelope L{x'),U{x') is added to 
LB_Keogh to compute LB Jmproved. 

Fig. 5. Computation of LB Jmproved as in Algorithm 3 



DTWi{x,yy. 



DTWi(x,i/) > LB_Keoghi(x,i/) 

n 

= Y.d{x,,[L{y),,UiyU) 



1=1 

d 



>Y.d{Pd{x),. [P,{L{y)),,P,{U{y)),]). 
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For our experiments, we chose the cover Cj = [1 + {j — l)\n/ d\, j\n/ d\\ for 
J = 1, . . . , ci - 1 and Cd = [1 + (d - 1) \n/d\ , n]. 

We can summarize the Zhu-Shasha R*-tree algorithm as follows: 

(1) for each time series x in the database, add Pd{x) to the R*-tree; 

(2) given a query time series compute its envelope E = P(i{L{y)) , Pd^U {y)); 

(3) starting with b — oo, iterate over all candidate Pd{x) at a /i distance b 
from the envelope E using the R*-tree, once a candidate is found, update 
b with DTWi(x,y) and repeat until you have exhausted all candidates. 

This algorithm is correct because the distance between E and Pd{x) is a lower 
bound to DTWi(a;, y). However, dimensionality reduction diminishes the prun- 
ing power of LB_Keogh : d{E,Pd{x)) < LB_Keoghj(x, y). Hence, we propose 
a new algorithm (R*-Tree+LB_Keogh) where instead of immediately up- 
dating h with DTWi(x,|/), we first compute the LB_Keogh lower bound be- 
tween X and y. Only when it is less than 6, do we compute the full DTW. 
Finally, as a third algorithm (R*-Tree-KLB .Improved), we first compute 
LB_Keogh, and if it is less than 6, then we compute LB .Improved, and only 
when it is also lower than b do we compute the DTW, as in Algorithm 3. 
R*-tree+LB_Improved has maximal pruning power, whereas Zhu-Shasha 
R*-tree has the lesser pruning power of the three alternatives. 



10 Comparing Zhu-Shasha R*-tree, LB Keogh, and LB Improved 

In this section, we benchmark algorithms Zhu-Shasha R*-tree, R*-TREE-|- 
LB.Keogh, and R*-tree+LB_Improved. We know that the LBJmproved 
approach has at least the pruning power of the other methods, but does more 
pruning translate into a faster nearest-neighbor retrieval under the DTW dis- 
tance? 

We implemented the algorithms in C-I--I- using an external-memory R*-tree. 
The time series are stored on disk in a binary flat flle. We used the GNU 
GCC 4.0.2 compiler on an Apple Mac Pro, having two Intel Xeon dual-core 
processors running at 2.66 GHz with 2 GiB of RAM. No thrashing was ob- 
served. We measured the wall-clock total time. In all experiments, we bench- 
mark nearest-neighbor retrieval under the DTWi. By default, the locality con- 
straint w is set at 10% (w = n/10). To ensure reproducibility, our source code 
is freely available [41], including the script used to generate synthetic data 
sets. We compute the full DTW using a 0(n'u;)-time dynamic programming 
algorithm. 

The R*-tree was implemented using the Spatial Index library [42] . In informal 
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tests, we found that a projection on an 8-dimensional space, as described by 
Zhu and Shasha, gave good results: substantially larger {d > 10) or smaller 
{d < 6) settings gave poorer performance. We used a 4,096-byte page size and 
a 10-entry internal memory buffer. 

For R*-TREE+ LB_Keogh and R*-tree+LB_Improved, we experimented 
with early abandoning [32] to cancel the computation of the lower bound as 
soon as the error is too large. While it often improved retrieval time slightly for 
both LB_Keogh and LBJmproved, the difference was always small (less than 
~ 1%). One explanation is that the candidates produced by the Zhu-Shasha 
R*-tree are rarely poor enough to warrant efficient early abandoning. 

We do not report our benchmarking results over the simple Algorithms 2 
and 3. In almost all cases, the R*-tree equivalent — R*-tree+ LB_Keogh or 
R*-tree+LB_Improved — was at least slightly better and sometimes several 
times faster. 



10.1 Synthetic data sets 

We tested our algorithms using the Cylinder-Bell-Funnel [43] and Control 
Charts [44] data sets, as well as over two databases of random walks. We 
generated 256-samplc and 1 000-sample random-walk time series using the 
formula Xi — -|- A^(0, 1) and xi — 0. 

For each data set, we generated a database of 50 000 time series by adding ran- 
domly chosen items. Figs. 6, 7, 8 and 9 show the average timings and pruning 
ratio averaged over 20 queries based on randomly chosen time series as we con- 
sider larger and large fraction of the database. LBJmproved prunes between 
2 and 4 times more candidates than LB_Keogh. R*-tree-|-LB_Improved is 
faster than Zhu-Shasha R*-tree by a factor between and 6. 

We saw almost no performance gain over Zhu-Shasha R*-tree with simple 
time series such as the Cylindcr-Bell-Funnel or the Control Charts data sets. 
However, in these cases, even LBJmproved has modest pruning powers of 40% 
and 15%. Low pruning means that the computational cost is dominated by 
the cost of the full DTW. 

10.2 Shape data sets 

We also considered a large collection of time-series derived from shapes [45,46]. 
The first data set is made of heterogeneous shapes which resulted in 5 844 
1 024-sample times series. The second data set is an arrow-head data set 
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Fig. 6. Nearest-Neighbor Retrieval for the 256-sample random-walk data set 
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Fig. 7. Nearest-Neighbor Retrieval 
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Fig. 8. Nearest-Neighbor Retrieval for the Control Charts data set 

with of 15 000 251-sample time series. We extracted 50 time series from 
each data set, and we present the average nearest-neighbor retrieval times 
and pruning power as we consider various fractions of each database (see 
Figs. 10 and 11). The results are similar: LBJmp roved has twice the prun- 
ing power than LB_Keogh, R*-tree-|-LB_Improved is twice as fast as R*- 
tree+LB_Keogh and over 3 times faster than the Zhu-Shasha R*-tree. 
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Fig. 9. Nearest-Neighbor Retrieval for the 1000-sample random-walk data set 
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Fig. 10. Nearest-Neighbor Retrieval for the heterogeneous shape data set 
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Fig. 11. Nearest-Neighbor Retrieval for the arrow-head shape data set 
10.3 Locality constraint 



The locality constraint has an effect on retrieval times: a large value of w makes 
the problem more difficult and reduces the pruning power of all methods. 
In Figs. 12 and 13, we present the retrieval times for w = 5% and w = 
20%. The benefits of R*-tree-|-LB .Improved remain though they are less 
significant for small locality constraints. Nevertheless, even in this case, R*- 
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Fig. 12. Average Nearest-Neighbor Retrieval Time for the 256-sample random-walk 
data set 
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Fig. 13. Average Nearest-Neighbor Retrieval Time for the arrow-head shape data 
set 

TREE+LB .Improved can still be three times faster than Zhu-Shasha R*- 
tree. For all our data sets and for all values of w E {5%, 10%, 20%}, R*- 
TREE+LB .Improved was always at least as fast as the Zhu-Shasha R*-tree 
algorithm alone. 



11 Conclusion 



We have shown that a two-pass pruning technique can improve the retrieval 
speed by three times or more in several time-series databases. In our imple- 
mentation, LBJmproved required slightly more computation than LB_Keogh, 
but its added pruning power was enough to make the overall computation sev- 
eral times faster. Moreover, we showed that pruning candidates left from the 
Zhu-Shasha R*-tree with the full LB_Keogh alone — without dimensionality 
reduction — was enough to significantly boost the speed and pruning power. 
On some synthetic data sets, neither LB_Keogh nor LBJmproved were able 
to prune enough candidates, making all algorithms comparable in speed. 
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A Some Properties of Dynamic Time Warping 

The DTW distance can be counterintuitive. As an example, if x, y, z are three 
time series such that x < y < z pointwise, then it does not follow that 
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DTWp{x,z) > DTWp{z,y). Indeed, choose x = 7,0,1,0, y = 7,0,5,0, and 
z = 7,7,7,0, then DTWoo(-2,y) = 5 and DTWoo(^,x) = 1. Hence, we review 
some of the mathematical properties of the DTW. 

The warping path aligns Xi from time series x and yj from time series y if 
e r. The next proposition is a general constraint on warping paths. 

Proposition 3 Consider any two time series x and y. For any minimal warp- 
ing path, if Xi is aligned with yj, then either Xi is aligned only with yj or yj 
is aligned only with Xi. Therefore the length of a minimal warping path is at 
most 2n — 2 when n > 1. 



PROOF. Suppose that the result is not true. Then there is Xk,Xi and yi,yj 
such that Xk and Xi are aligned with yj, and yi and yj are aligned with Xj. 
We can delete {k,j) from the warping path and still have a warping path. A 
contradiction. 

Next, we show that warping path is no longer than 2n — 2. Let ni be the 
number of points in x aligned with only one point in y, and let n2 be the 
number of points in y ahgned with only one point in x. The cardinality of 
a minimal warping path is bounded by ni + If ni = n or n2 = n, then 
Ui = n2 = n and the warping path has cardinahty n which is no larger than 
2n — 2 for n > 1. Otherwise, ui <n — l and n2 < n — 1, and ui -\- n2 < 2n — 2. 



The next lemma shows that the DTW becomes the Ip distance when either x 
or y is constant. 

Lemma 2 For any < p < oo, if y = c is a constant, then NDTWp{x,y) = 
DTWpix,y) = \\x-y\\p. 

When p = oo, a stronger result is true: if y = x + c for some constant c, 
then NDTWoo(a;,y) = DTWoo(.«,|/) = \\x - y\\oo- Indeed, NDTWoo (a:, y) > 
I max(y) — max(a;)| = c = \\x — y\\oo > \\x — y\\oo which shows the result. This 
same result is not true for p < oo: for x — 0, 1, 2 and y — 1,2, 3, we have 
11^ ~ y\\p — ^ whereas DTWp{x,y) = {/2. However, the DTW is translation 
invariant: DTWp(a;, z) = DTWp(a; + 6, z+b) and NDTWp(x, z) = NDTWp(a;+ 
b,z-\-b) for any scalar b and < p < oo. 

In classical analysis, we have that > \\x\\p [47] for 1 < p < q < 

oo. A similar results is true for the DTW and it allows us to conclude that 
DTWp(x,7/) and NDTWp(a;,y) decrease monotonically as p increases. 

Proposition 4 For 1 < p < q < oo, we have that {2n-2y/P-'^/iDTWg{x, y) > 



22 



DTWp{x, y) where n is the length of x and y. The result also holds for the non- 
monotonic DTW. 

PROOF. Assume n > 1. The argument is the same for the monotonic or 

non-monotonic DTW. Given x, y consider the two ahgncd (and extended) time 
series x', y' such that DTWq{x, y) = \\x' — y'\\q. Let n^' be the length of x' and 
Uyi be the length of ij'. As a consequence of Proposition 3, we have rix' = Uy/ < 
2n — 2. From classical analysis, we have n]/,^ ^^'^\\x' — y'\\q > \\x' — y'\\p, hence 
\2n-2\^/P-^/i\\x'-y'\\g > \\x'-y'\\pOr |2n-2|Vp-V9DTWg(x,7/) > 
Since x', y' represent a valid warping path of x, y, then y'||p > DTWp(x, y) 
which concludes the proof. 



B The Triangle Inequality 

The DTW is commonly used as a similarity measure: x and y are similar if 
DTWp{x,y) is small. Similarity measures often define equivalence relations: 
A ~ y4 for aU A (refiexivity) , A B ^ B A (symmetry) and A ~ i? A 5 ~ 
C A C (transitivity) . 

The DTW is reflexive and symmetric, but it is not transitive. Indeed, consider 
the following time series: 

a: = 0,0, ...,0,0, 

^ V ' 

2m+l times 

r = 0,0, 0,0,6,0,0, ...,0,0, 

^ V ' ^ V/ ' 

m times m times 

Z = 0, e, e, . . . , e, e, 0. 
^ ^ ^ 

2m— 1 times 

We have that NDTWp(X, Y) = DTWp(X, Y) = |e|, NDTWp(y, Z) = DTWp{Y, Z) = 
0, NDTWp(X,Z) = DTWp(X,Z) = ^(2m- l)|e| for 1 < p < oo and 
w = m — 1. Hence, for e small and n 3> 1/e, we have that A ~ F and 
y ~ Z, but X rfj Z. This example proves the following lemma. 

Lemma 3 For 1 < p < oo and w > 0, neither DTWp nor NDTWp satis- 
fies a triangle inequality of the form d{x, y) + d{ii, z) > cd{x, z) where c is 
independent of the length of the time series and of the locality constraint. 

This theoretical result is somewhat at odd with practical experience. Casacu- 
berta et al. found no triangle inequality violation in about 15 million triplets 
of voice recordings [48]. To determine whether we could expect violations 
of the triangle inequality in practice, we ran the following experiment. We 
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used 3 types of 100-sample time series: white-noise times series defined by 
Xi = N{0, 1) where N is the normal distribution, random-walk time series 
defined by = + A^(0, 1) and Xi = 0, and the Cylinder-Bell-Funnel time 
series proposed by Saito [43]. For each type, we generated 100 000 triples of 
time series x,y,z and we computed the histogram of the function 

DTWp(,T,^) 

C{x,y,z) - 



DTWpix,y) + DTW,{y,z) 

for p = 1 and p = 2. The DTW is computed without time constraints. Over 
the white-noise and Cylinder-Bell-Funnel time series, we failed to find a single 
violation of the triangle inequality: a triple x,y,z for which C{x,y,z) > 1. 
However, for the random-walk time series, we found that 20% and 15% of the 
triples violated the triangle inequahty for DTWi and DTW2. 

The DTW satisfies a weak triangle inequality as the next theorem shows. 

Theorem 2 Given any 3 same-length time series x, y, z and 1 < p < 00, we 
have 

DTW,{x, y) + DTW,{y, z) > "{y, 

m.m[2w + l,n)^'P 

where w is the locality constraint. The result also holds for the non-monotonic 
DTW. 



PROOF. Let r and F' be minimal warping paths between x and y and be- 
tween y and z. Let F" = {{h 3,k)\{i, j) e F and {j,k) e F'}. Iterate through 
the tuples {i,j,k) in F" and construct the same-length time series x",y",z" 
from Xi, yj, and z^. By the locality constraint any match G F cor- 

responds to at most min(2w + l,n) tuples of the form G F", and 

similarly for any match {j,k) G F'. Assume 1 < p < 00. We have that 
\\x"-y"Z = E(ij,it)6r" k~%f < min(2«;+l,n)DTWp(x,y)^'and \\y"-zX^ 
J2{i,j,k)er" IVj ~ ^k\^ ^ min(2'u; -|- 1, n)DTWp(y, z^. By the triangle inequality 
in Ip, we have 

mm(2w + 1, n) (DTWp(x, y) + BTWp(y, z)) > \\x" - y% + \\y" - z"\\p 

> \\x" -z"\\p > DTWp{x,z). 

Forp = 00, max(ij-fc)er" ll^^i-l/illp = DTWoo(a;, y)^' and msiX(ij^k)er" \yj-Zk\'' = 
DTWoo(|/, zY, thus proving the result by the triangle inequality over l^. The 
proof is the same for the non-monotonic DTW. 



The constant min(2u' -|- 1, n)^/^ is tight. Consider the example with time series 
X, y, Z presented before Lemma 3. We have DTWp(X, Y) +DTWp(F, Z) = \t\ 
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and DTWp{X,Z) = ^(2^ + l)|e|. Therefore, we have 
DTWp(X, Y) + DTWp(y, Z) 



DTWp(X, Z) 



min(2-u; + 1, ny/P 



A consequence of this theorem is that DTWoo satisfies the traditional triangle 
inequality. 

Corollary 3 The triangle inequality d{x, y)+d{y, z) > d{x, z) holds for DTW^t 
and NDTWoo- 

Hence the DTWoo is a pseudometric: it is a metric over equivalence classes 
defined by a; ~ y if and only if DTWoo(a^, y) = 0. When no locality constraint 
is enforced {w >n), DTWoo is equivalent to the discrete Frechet distance [49]. 



C Which is the Best Distance Measure? 



The DTW can be seen as the minimization of the Ip distance under warping. 
Which p should we choose? Legrand et al. reported best results for chromosome 
classification using DTWi [13] as opposed to using DTW2. However, they did 
not quantify the benefits of DTWi. Morse and Patel reported similar results 
with both DTWi and DTW2 [50]. 

While they do not consider the DTW, Aggarwal et al. [51] argue that out of 
the usual Ip norms, only the li norm, and to a lesser extend the I2 norm, ex- 
press a qualitatively meaningful distance when there are numerous dimensions. 
They even report on classification-accuracy experiments where fractional Ip 
distances such as Zq.i and Zo.5 fare better. Prangois et al. [52] made the theo- 
retical result more precise showing that under uniformity assumptions, lesser 
values of p are always better. 

To compare DTWi, DTW2, DTW4 and DTWoo, we considered four different 
synthetic time-series data sets: Cylinder-Bell- Funnel [43] , Control Charts [44] , 
Waveform [53], and Wave+Noise [54]. The time series in each data sets have 
lengths 128, 60, 21, and 40. The Control Charts data set has 6 classes of time 
series whereas the other 3 data sets have 3 classes each. For each data set, we 
generated various databases having a different number of instances per class: 
between 1 and 9 inclusively for Cylinder-Bell- Funnel and Control Charts, and 
between 1 and 99 for Waveform and Wave+Noise. For a given data set and 
a given number of instances, 50 different databases were generated. For each 
database, we generated 500 new instances chosen from a random class and we 
found a nearest neighbor in the database using DTWp for p = 1, 2, 4, 00 and 
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(a) Cylinder-Bell-Funnel 
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(c) Waveform 
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(b) Control Charts 
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(d) Wave-I-Noise 



Fig. C.l. Classification accuracy versus the number of instances of each class in four 
data sets 

using a time constraint of w = n/10. When the instance is of the same class 
as the nearest neighbor, we considered that the classification was a success. 



The average classification accuracies for the 4 data sets, and for various num- 
ber of instances per class is given in Fig. C.l. The average is taken over 
25 000 classification tests (50 x 500), over 50 different databases. 

Only when there are one or two instances of each class is DTWqo competitive. 
Otherwise, the accuracy of the DTWoo-based classification does not improve 
as we add more instances of each class. For the Waveform data set, DTWi 
and DTW2 have comparable accuracies. For the other 3 data sets, DTWi has 
a better nearest-neighbor classification accuracy than DTW2. Classification 
with DTW4 has almost always a lower accuracy than either DTWi or DTW2. 

Based on these results, DTWi is a good choice to classify time series whereas 
DTW2 is a close second. 
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